hCard enable the comments in BlogEngine.NET 1.4.5

Published Aug 10, 2008

The hCard microformat is used to make contact information machine readable. In BlogEngine.NET 1.4.5 this is being supported in the post comments. However, if you are writing your own custom theme, you need to add a little bit of code to your CommentView.ascx theme file.

The themes bundled in BlogEngine.NET 1.4.5 already have these small pieces of code embedded, so let’s take a look at the Standard theme’s CommentView.ascx file.

The containing <div> now have two classes: vcard and comment. The vcard class is new and is the one that triggers the hCard microformat. It looks like this:

<div id="id_<%=Comment.Id %>" class="vcard comment...

If the vcard class is added, machines will expect to find an hCard microformat within that <div>, but we need to add one more class to make it a valid hCard – the fn class name.

In the Standard theme’s CommentView.ascx file you can see where the name of the comment author is written. If it author supplied her website URL an hyperlink is created, otherwise a span tag. The hyperlink has a class attribute with two class names: fn and url. This tells the hCard crawlers that this is both the full name and the URL of the contact. In the span, only the fn class name is needed.

So, if you want to support microformats in your custom themes; take a look at the Standard theme’s CommentView.ascx file and make the appropriate modifications. BlogEngine.NET 1.4.5 already adds the appropriate classes to the avatar image and country flag, so you don’t have to do anything there.

Find semantic links in a web page

Published Jul 15, 2008

Imaging a visitor that enters his website URL into a textbox and when he clicks the submit button, you are able to retrieve all kinds of information from the guy. His name, company info, online profiles, interests etc. all this from just a URL. It’s actually pretty easy if the website contains information about FOAF, APML or SIOC documents.

What you have to do is to download the HTML from the website and look for <link> elements in the header that matches FOAF, APML or SIOC type links. Then retrieve the URL to those documents from the href attribute and load it into an XML document. Now you can use XPath to find all the information you need.

Here’s is what a FOAF link element looks like:

SIOC and APML links uses the same attributes in the same way, so we can use the title attribute to figure out which kind of document it is. All we need is a method that uses regular expressions to retrieve the document URLs from the HTML.

The code

This is a method that finds all the semantic links of a certain type in a HTML string.

 private const string PATTERN = "<head.*<link( [^>]*title=\"{0}\"[^>]*)>.*</head>"; 
 private static readonly Regex HREF = new Regex("href=\"(.*)\"", RegexOptions.IgnoreCase | RegexOptions.Compiled); 
   
 /// <summary> 
 /// Finds semantic links in a given HTML document. 
 /// </summary> 
 /// <param name="type">The type of link. Could be foaf, apml or sioc.</param> 
 /// <param name="html">The HTML to look through.</param> 
 /// <returns></returns> 
 private static Collection<Uri> FindLinks(string type, string html) 
 { 
   MatchCollection matches = Regex.Matches(html, string.Format(PATTERN, type), RegexOptions.IgnoreCase | RegexOptions.Singleline); 
   Collection<Uri> urls = new Collection<Uri>(); 
   
   foreach (Match match in matches) 
   { 
     if (match.Groups.Count == 2) 
     { 
       string link = match.Groups[1].Value; 
       Match hrefMatch = HREF.Match(link); 
   
       if (hrefMatch.Groups.Count == 2) 
       { 
         Uri url; 
         string value = hrefMatch.Groups[1].Value; 
         if (Uri.TryCreate(value, UriKind.Absolute, out url)) 
         { 
           urls.Add(url); 
         } 
       } 
     } 
   } 
   
   return urls; 
 } 

Example

To find all the FOAF links in a page you can write something like this:

 using (WebClient client = new WebClient()) 
 { 
   string html = client.DownloadString(txtUrl.Text); 
   Collection<Uri> col = FindLinks("foaf", html); 
   
   foreach (Uri url in col) 
   { 
     XmlDocument doc = new XmlDocument(); 
     doc.Load(url.ToString()); 
     Response.Write(Server.HtmlEncode(doc.OuterXml)); 
   } 
 } 

If you want to search for APML or SIOC then just replace “foaf” with either “apml” or “sioc” in the method parameter. You might also want to take a look at my experimental FOAF parser class.